Project Title Description Hardware Software Electrical component themed AI detection and identification. A mounted camera above a surface (part of the product) Produces a controlled environment live feed for the application An application running inference on a live USB camera feed (optionally imported picture or video) Training through YOLO Application Augmentation if the acquired data to simulate differences in the enivornment and to provide imperfections to train against Training against the raw and augmented data If the results are not good enough, may gather more data, or augment further and retrain further Augmentation examples Addition of glare Rotation Blurring Addition of spots GUI The Application is based on Qt Creator, using C++ Inference Running on C++ Utilising Ultralytics YOLOv8 Summary Detection via Inference Detect and display boundaries for each identified class from the input image using Inference. Identifcation Post-processing of the components in the bounding boxes detected by inference, which may have additional information that can be identified by a variety of approaches. Examples LEDs Resistors Resistor code value LED color Technology AI based Electrical Component Identifier IC Components Pin count Information written on the component Features Inference Classes to train the model to detect: Resistor Diode Capacitor LEDs Integrated Circuits AC DC LDR Milestones Base camera rig Initial inference model training Inference running Testing with video footage from a mobile device Research Models Ultralytics YOLO Live Labeling Focus Audience Set Rig The set position of the camera, a significant reduction in distance between the objects, and significant consistency of the lighting provided by the ring light, and the static background - will boost the confidence of the inference considerably. Training Running Post-processing Rationale Timeline Gantt Chart Live training YOLOv5 Training 1st batch, test run Image Count 100 Classes (1 Total) Resistor 2nd batch Image Count Training Testing 20 Training Testing 1800 540 3rd batch Image Count Training Testing 2393 724 Classes (9 Total) red_led green_led blue_led yellow_led ac_capacitor dc_capacitor resistor sip_resistor pcb_terminal Classes (10 Total) red_led green_led blue_led yellow_led ac_capacitor dc_capacitor resistor sip_resistor pcb_terminal metal_nut Augmentation Default Augmentation Default Average time per epoch 34 seconds Epoch count 600 Epoch count 600 Epoch count 2000 Augmentation Default YOLOv8 Due to the angle and lighting both being known and mostly set thanks to using a set rig, the input dataset does not need to cover angles and lighting outside what the rig will expose it to during runtime. The sum of all the points covered above results in a significant reduction in data required to train when compared to a setup without a set rig, for equivelant confidence values during runtime. The angle range is reduced only to looking from top to down, eliminating the rest of the angle range. While the lighting will change depending on the room conditions, the ring light around the camera will provide significant consistency in lighting. While this does not not eliminate the necessity to train against various lighting conditions, it does reduce their significance and increase certainty of the detection. Only the components being detected need to be trained in all angles, as opposed to the camera gathering the dataset requiring to be positioned in different angles. Having a top to down view also eliminates the majority of issues that come with glare from high luminosity bodies, such as clouds or the sun. A set rig significantly limits the distance that the objects will be from the camera during runtime, allowing for further confidence in the predictions. Static background Angle range Lighting Apart from dust or unexpected objects present on the rig's surface, which should be removed before usage - the background that the objects are in front of will stay mostly consistent. This reduces the necessity to gather data of the same object under backgrounds that are not expected to be used during runtime. While this project may be retrained and refocused to be utilised for many different fields - it is trained for electrical component identification, which is focused towards engineers. Architectures This project focuses on both existing engineers, and ones that are interested in becoming engineers. Having access to the provided by the project quick identification of components, count of each, and any potential additional information saves time spent manually analysing this information. Average time per epoch 2 minutes Average time per epoch 2 minutes and 20 seconds SIP Resistor Singular Acronyms SIP Introduction Single Inline Package GPU Graphics Processing Unit CPU Central Processing Unit AI Artificial Intelligence LDR Light Dependant Resistor LED Light Emitting Diode AC DC Alternating Current Direct Current PCB terminal PCB Printed Circuit Board The most prominent color may be identified by sorting all the colors from the image into their hue values, and checking which hue is most active. The color codes can be identified by processing the image using filters and otherwise until only the prominent colors remain. These can be processed into the actual ohm value. Then, the positions of the color codes relative to the body of the resistor can be used to identify the specific positions and order of the color codes. The pin count can be identified by processing the image using filters until there is a clear contrast between the body of the chip, and the pins. One approach that could help identify the number of pins would be drawing a line between two of the pins and seeing how many of the pins touch this line. Taking the line that touches the most pins would provide the pin count of this IC. OCR may be used OCR Optical Character Recognition Software based reading alphabetical characters from an image that contains written text. Input image Inference method Algorithm method Different color LEDs may be trained as individual classes. Has the disadvantage of requiring training for each individual LED separately, as opposed to one generic LED. Has the advantage of working on any LED. Raw input High contrast filter Colors histogram Approaches after filtering Clearly prominent yellow Has the disadvantage of potentially giving false information if the background is too vibrant. Contrast approach HSV Taking the average of all the pixels hue values that have a value above a certain threshold. Around 0.7 on a range from 0 to 1 should be appropriate. Hue is in the range of 0 to 360 degrees. The pink dots represent the pixel values obtained from the previous step. Taking the average of this data, the result would land in the degree value that can be easily determined as yellow, by separating the hue circle into sections of colors by degrees ranges. HSV, or Hue Saturation Value, is an alternative way to represent colors. It can be advantageous over RGB in situations such as this. RGB Red Green Blue Commonly used to referred to a way of defining colors by their Red, Green and Blue properties. HSV Hue Saturation Value Commonly used to referred to a way of defining colors by their Hue, Saturation and Value properties. Yellow is between 72° and 108° degrees on the hue circle. Note: This example would ignore colors that are darker than 0.7, on a range of 0 to 1. The ability to take a snapshot of the current frame, defining appropriate labels, and saving this labeled snapshot for future training. All from inside the GUI. Alternatively, taking snapshots of the GUI and saving them for later labeling. Sorted from highest priority, to lowest. Setting up the camera on a rig Base GUI GUI with essentials to interface the camera through USB, with A live display from the camera on the rig. Ability to take images by pressing a button. Support for running Inference. ~100 images of a single class, taken from the rig for initial training and testing of the model. Initial dataset gathering For the purpose of testing inference on the rig. Proof of concept. The results will not be perfect as the dataset is minimal, and only contains 1 class. Further dataset gathering At least 250 pictures of each class of every component that the project is designed to detect. Furter model training This training will take considerably longer than the initial training. Around 2 minutes per epoch, and should be ran for at least 300 epochs. The initial training should not take long at all, and does not require to be polished. Training for ~100 epochs should be sufficient, with each epoch taking ~20 seconds on the machine available. Rig Model Training via Deep Learning Machine used Personal Computer CPU GPU AMD RyzenTM 7 5800X3D Core count 8 Base clock frequency 3.4GHz L3 Cache 96MB Maximum operating temperature 90°C Thread count 16 GeForce RTX 3060 Ti Memory 8192MB CUDA core count 4864 Capacity Type GDDR6X Base clock frequency 1.41GHz The goal is to reach 0.8 from range of 0 to 1 confidence values. Ability to gather further information from the detection bounding boxes provided by the inference. After the previous steps are in good shape, investigation of moving the inference to a mobile device will begin. If the confidence values are not up to standard, more data will be gathered from this and potentially other mobile devices, and further training will follow, until the results are adequate. If adequate results are achieved before the deadline of this project, deployment to a mobile device will be started. If the frame rates are not sufficient enough, the inference may be ran on still images to improve user experience. Optional: Ability to label the images from the device, without requiring external software. It may be advantageous given the timeframe of the project to instead gather data during a session and labelling it afterwards. Memory Capacity Type 2x16GB DDR4 Frequency 3.6GHz Brand Corsair Name Vengeance RGB PRO SL Link https://www.corsair.com/eu/en/Categories/Products/Memory/Vengeance-RGB-PRO- SL-Black/p/CMH32GX4M2E3200C16 Brand AMD Name Ryzen 7 58700X3D Link https://www.amd.com/en/products/cpu/amd-ryzen-7-5800x3d Brand NVIDIA Series 30 Name RTX 3060Ti Link https://www.nvidia.com/en-gb/geforce/graphics-cards/30-series/rtx-3060-3060ti/ CUDA Special cores that are designed for compute-intensive tasks. These run parallel with the CPU, and may also run parallel with multiple GPUs. They are perfect for deep learning, as deep learning is incredibly compute intensive. Deep learning training times are predictable, and stay mostly constant between epochs. This means that there are no race conditions, and the more processing power available, the quicker the epoch will finish. Each of these steps should be polished before continuing to the next one, to provide a solid foundation for the next step to be based on. Model Training via Deep Learning Analysis Brief History YOLO, which stands for You Only Look Once is a popular image segmentation and object detection model that was originally developed by Joseph Redmon and Ali Farhadi. The first version was released in 2015, and it very quickly became popular due to the significantly superior speed and accuracy when compared to other architectures. YOLOv1 YOLOv4 Released in 2018, Introducting of Mosaic data augmentation, and a new and improved loss function - decreasing time taken to achieve better results for the trained model. YOLOv5 Released in 2020, Introducing support for Object Tracking - which allows following a moving object, and Panoptic Segmentation, which allows identification of overlapping objects, with accurate bounding boxes. Ultralytics YOLOv8 The latest version of YOLO as of today. YOLOv8 is a state-of-the-art model that builds upon the already very successful previous YOLO versions, introducing new performance and flexibility features. Full support for previous YOLO versions, making it incredibly convenient for existing users of previous YOLO versions to take advantage of the new features. Versions Comparison In general, YOLOv8 is superior to all of its predecessors. While YOLOv5 is mostly underperforming when compared to the next versions, it is important to note how incredibly minimal the delays are even on a version so outdated now. YOLO offers pretrained models that are used to start train custom models. Each model has its advantages and disadvantages, and should be picked depending on the project. Size mAP single-model single-scale values while detecting on the COCO val2017 dataset. Speed Averaged time taken using the Amazon EC2 P4d instance on the COCO dataset. The pixel height and width the model operates up to. Params (In Millions) The number of parameters that are tweaked per epoch while training, and processed during inference. FLOPS Floating Point Operations Per Second A measure based on Floating Point Operations that is relevant in the field of Deep Learning. Diminishing results can be observed on the mAP values when compared to the time taken (Speed). Model properties In some circumstances, max precision is essential, and is prioritised over the hardware requirements. This is when a higher model should be chosen. In the scope of this project - the YOLOv8m model has been chosen. The morale behind this choice is to take the advantage of the high mAP value, while not exceeding the time taken too much, in preparation for a future mobile deployment of the model. In comparison of YOLOv5 and YOLOv8 versions - a clear advantage can be seen when taking into account the size of the model (param count), and the resulting mAP output, as well as the time taken. Architecture choice YOLO has been chosen as the architecture that this project utilises for the AI detection. At the start of the project, there was already a high bias towards YOLO due to the highly positive past experience with YOLOv5 and all the incredible features that it offers. Upon release of YOLOv8 and all the superior features and specifications that it provides on top of the previous versions - YOLOv8 was an obvious choice in the architecture that will be used for the project. Description As the name suggests, YOLO focuses on detection of multiple classes in a single "look", which is a single analysis of the entire input image. When compared to many other architectures before YOLO, realistically, no matter how quick the other architectures may be - this is an incredibly superior approach, as other architectures would approach detection by reanalysing the entire image for every single class that the model was trained for - increasing the time taken per detection additively per class. An approach like this may seem too good to be true, and that it should come with signficant cost to the speed and confidence of the model. But when the results are analysed - that could barely be further from the truth. YOLO is an incredibly efficient and accurate architecture. These days most sophisticated architectures approach object detection similarly to YOLO, but YOLO is still a state-of-the-art architecture that continues to improve and grow to this day. Internal AI Object Detection steps Classification Object Detection Segmentation The process of identifying the exact bounding box of the item detected. The Bounding by a box of the classified segments of the image. The identification of a part of an image believe to contain an item of a class the model was trained to detect. Visual examples Resizing Joining up of multiple images to create new ones The reduction of data required to train makes it feasible to train relatively high quality models from data gathered and trained from home. Marking Codes Hardware Raspberry Pi Beaglebone Nvidia Jetson Nano Intel Neural Compute Stick 2 Specifications Processor Base Frequency 700MHz Memory 2GB Specifications Core Count (GPU) 128 GPU Max Frequency 921MHz Core Count (SHAVE) 16 Advantage Offers computational power through a USB connection - can be used to run Inference on existing devices, such as a laptop. Specifications Core Count (GPU) 2 GPU Max Frequency 532MHz Specifications Core Count (GPU) 4 GPU Max Frequency 700MHz Type Standalone Type Standalone Type Standalone Type Extension Resistors and Inductors Capacitors ICs Color coded Number coded Android Phone Specifications depend on the specific device Benefits Widely and easily accessible On average, superior than the alternatives. Has a built-in camera that is considerably more convenient than the alternatives. YOLO You Only Look Once An image detection architecture that the project is based on. CUDA cores provided by the GPU CPU Inference Training Personal Computer Rented Dedicated Server Advantages Disadvantages Advantages Disadvantages Local - Provided a local machine is already owned, it is immediately available. Utilises multiple GPUs - Quicker epoch computations, resulting in quicker training. Cloud based Allows for parallel computing, as opposed to using your personal computer at home. Cloud based - upload and download times Datasets tend to be considerably big in size. A smaller dataset of ~2000 images takes up ~3gb of space. This is not a significant amount of data for a local machine to transfer, but it is a considerable amount for uploading. Cost The bigger the server - the higher the rates become. Cost As opposed to a rented server - acquiring your own machine has the benefit of owning the machine, and being able to use it indefinitely (Or until it eventually breaks.) While the initial cost of acquiring an adequate machine for deep learning is higher than renting a server for a few months, it is a worthwhile long-term investment into a machine that can be used for a variety of casual or intensive tasks. Setup time Setup time Speed Speed When compared to a sophisticated server that runs many GPUs - a local machine will most likely process the training at a slower rate than a dedicated server would. A local machine will likely contain one, maybe two GPUs. Pictures are taken from the machine itself. No upload/download times. Devices Discussion When compared to training - usage of the trained model to run inference is considerably quicker R-CNN Description Disadvantages Not real-time. On average, takes 47 seconds to process a single frame. Discussion It should be noted that R-CNN has a successor called Fast R-CNN and Faster R- CNN. However, even the fastest of the choices still barely manages 5 frames a second at best. R-CNN, which stands for Region Based Convolutional Neural Networks was released in 2013. As other object detection architectures, R-CNN takes an input image, and outlines bounding boxes where it believes an item of a certain class is present. While 5 frames a second is an impressive and definitely useable result, there are alternative architectures that offer a significant improvement in inference time. Developed by Ross Girshick SSD Description SSD, which stands for Single Shot Detector. SSD was released in 2017 Developed mostly by Max deGroot and Ellis Brown Discussion Offers great framerates of an average of 45 frames per second when tested on a relatively old now graphics card NVIDIA GTX 1060. Disadvantages According to the Git repository, the project was seemingly abandoned about 4 years ago. According to the Git repository, the project was seemingly abandoned about 5 years ago. Discussion One of the most feature-rich, cutting-edge, state-of-the-art and popular architectures that is in use today. The component of a computer where the core computations are processed. An optional component of a computer that is dedicated and optimised in computing graphical tasks. Existing labelling related software offers quality of life features, such as rough auto labelling of the images, which only requires the user to adjust the bounding boxes and confirm their validity, rather than having to define the boxes from start to finish. Inference Example Inference Example Discussion Surprisingly good results for a model trained from 120 images, with confidence values above 0.8 and sometimes over 0.9! Pretrained model used yolov5s Architecture YOLOv5 Architecture Pretrained model used yolov5m YOLOv5 Architecture Pretrained model used yolov5m YOLOv5 Discussion Rather poor results. Confidence values usually below 0.7, struggled to classify accurately. Discussion Great results with confidence values consistently above 0.8, classifying all classes accurately! Technology utilised Deep learning computation with CPU Cores and GPU CUDA Cores running in parallel. 220Ohm resistor example Color codes Red = 2 Brown = 1 Gold = 5% tolerance 100nF capacitor example Unfortunately for the purposes of automatic identification of Integrated Circuit markings, most IC manufacturers do not follow any global standard for marking their ICs. Most manufacturers tend to have their own internal IC marking standards. Due to this fact - only known markings can be used to identify components. Mixed manufacturer ICs example This example illustrates the vast variation and lack of identifiable without access to datasheets markings. Architecture YOLOv5 Architecture A silicon board that has parts of it etched away, with only conductive tracks remaining in specific positions that are pre-planned using a CAD software. Widely used to implement electronic circuits. CAD Computer Aided Design CAD software accelerates and automates designs in various different fields. Instructions are given to a computer that are translated into more complex and intuitive, usually GUI based interactive programs. Electrical current that oscillates. Electrical current that stays constant. An electrical component that emits light when current is passed through the circuit. A resistor that varies in resistance relatively to the amount of light the body of the component is exposed to. A ring light has been added for both training and inference running. Historical Issues encountered A glitch in augmentation provided by YOLOv5, where rotation during augmentation has shifted the bounding boxes of the components, causing inaccurate feedback to the model, preventing it from training appropriately. Actual bounding boxes after rotation augmentation Note the unnecessarily expanded bounding boxes. Description Submitted GitHub issue Link https://github.com/ultralytics/yolov5/issues/10639 Information gathered from replies as of todays date This issue has been reported to be part of YOLOv7 augmentation also. Example Expected bounding boxes after rotation augmentation Note the snug fit of the bounding box around the edges of the component. That is desirable, as it provides accurate information on what the model should be looking for. This will train the model in undesirable ways, detecting parts it should not. Augmentation rotation issue